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Computer Architectures for Computational Physics 
work done by 

Computational Research and Technology Branch 

and 

Advanced Computational Concepts Group 
Ames Research Center 


The following slides describe the importance of having high performance 
number crunching and graphics capability. They also indicate the types 
of research and development underway at Ames Research Center to ensure 
that, in the near-term, Ames is a smart buyer and user, and in the long- 
term we know what the best possible solutions are for our number 
crunching and graphics needs. 

The drivers for this research are real computational physics applications 
of interest to Ames and NASA. We are concerned with how to map the 
applications, how to develop the optimal system software and system 
architecture, and how to maximize the physics learned from the results 
of the calculations (which at the present time means graphics). We are 
utilizing a group of DEC and CRAY manufactured MIMD architectures, 
various simulation tools for larger MIMD architectures, and also plan to 
utilize various versions of the Hypercube architecture. To control flow 
we are looking at simulations and prototypes for the study of data flow 
and systolic architectures. At present, it is a competition between the 
three architectures to determine which one will hold the most promise for 
the early 1990s. Once we have discovered which one (or two) hold the 
promise we will concentrate our computer science R&D in that area. 

The computer graphics R&D activities are directed at getting maximum 
information from our three-dimensional calculations by utilizing the 
real time manipulation of three-dimensional data on the Silicon Graphics 
IRIS Workstation. We are also working on new algorithms which will 
permit the display of experimental results, which are sparse and random, 
the same way we display computed results, which are dense and regular. 
This would permit the synergistic coupling of computational and experi- 
mental techniques. 
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Related Research and Development 
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Start with "real" complete applications 
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Computer Graphics 


Algorithms of Interest 
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Large Eddy Simulation Utilizing Spectral Methods 


Architectures 



CRAY X/MP-48 




















Performance of Multitasking on the CRAY X/MP 
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Performance of Multitasking on the Dual VAX 11/780 
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PPA Architecture 




Circuit-switched Network Simulation 

Motivation and Objectives 

• Understand performance of networks which 
could be used to build high-performance 
parallel architectures 

• Use a real application (LES) from Ames to 
generate data for this study 

• Understand how a real CFD problem could 
map onto a large MIMD architecture 

The Model 

• A circuit switched Omega network serving 
multiple processors connected to multiple 
modules of a shared memory 

• Queues of requests exist at each processor 
port and are served one at a time 

Construction of the Simulator 

© Discrete event simluation facility of SLAM 
driven by FORTRAN subroutines 

• Statistics collected on service times 
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Bandwidth of Network for Various Cases 

Three cases: 

• Real data from a CFD code (LES) 

• Random data 

• Infinite vectors with p=1 


Total Bandwidth in MW/sec. 

n 

MAX 

Random 

Vectors 

Actual 

8 

36 

12.5 

5.52 

5.75 

16 

67 

12.2 

5.60 

5.62 

32 

123 

5.12 

5.12 

5.24 

64 

229 

5.76 

4.16 

4.36 


For comparison look at Crays: 


Maximum Bandwidth in MW/sec. 

Machine 

Bandwidth 

Cray 1 

80 

Cray X-MP 

631 

512*512 

1500 
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Conclusions from the Network Simulation 

Modelling network traffic with streams of random 
data can be very misleading since actual codes 
exhibit a very different behavior 

The bandwidth of the network does not increase 
linearly with the number of ports 

A circuit-switched network such as this is far too 
slow to be useful for building high-performance 
MIMD architectures 
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System Architecture of a Systolic Attached Processor 


I 









Static Data Flow Machine Architecture 




RN Routing Network. 512 by 512, 16 bit data paths, operates at > 
5MHz, average rate of transmitting FP packets 0.25 MHz from a 
single PE to another. 

PE Processing Elements. 5 to 8 MFLOPS with two 1.25 to 2 
MFLOP multipliers. 256 PE’s in the system. 

IS Instruction Store. 1024 cells for FP instructions, 1024 for others. 

AM Array Memory. Size not fully determined. At least 256K 64 bit 
words per PE. 

IO Input Output. Includes mass memory, host processor, and 
display systems. 256 paths through the RN are reserved for IO. 
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Status of Data Flow Simulator 
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Questions to be Answered by the Simulator 
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What is the best way to distribute instructions across the processming elements? 
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computational physics applications 


Organization of Data Flow Simulator 
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DRIVER will run on VAX to allow interactive use, TRANSLATOR and SIMULATOR will run on Cray because of length of run. 


